R Data Analysis Cookbook by Unknown
Author:Unknown
Language: eng
Format: mobi, epub
Publisher: Packt Publishing
How it works...
In step 1 the data is read and in step 2 we define the convenience function for scaling a set of variables in a data frame.
In step 3 the convenience function is used to scale only the variables of interest. We leave out the No, model_year, and car_name variables.
In step 4 the distance matrix is created based on the standardized values of the relevant variables. We have computed Euclidean distances; other possibilities are: maximum, manhattan, canberra, binary, and minkowski.
In step 5 the distance matrix is passed to the hclust function to create the clustering model. We specified method = "ward" to use Ward's method, which tries to get compact spherical clusters. The hclust function also supports single, complete, average, mcquitty, median, and centroid.
In step 6 the resulting dendrogram is plotted. We specified labels=FALSE because we have too many cases and printing them will only add clutter. With a smaller dataset, using labels = TRUE will make sense. The hang argument controls the distance from the bottom of the dendrogram to the labels. Since we are not using labels, we specified hang = 0 to prevent numerous vertical lines below the dendrogram.
The dendrogram shows all the cases at the bottom (too numerous to distinguish in our plot) and shows the step-by-step agglomeration of the clusters. The dendrogram is organized in such a way that we can obtain a desired set of clusters, say K, by drawing a horizontal line in such a way that it cuts across exactly K vertical lines on the dendrogram.
Step 7 show how to use the rect.hclust function to demarcate the cases comprising the various clusters for a selected value of k.
Step 8 shows how we can use the cutree function to identify, for a specific K, which cluster each case of our data belongs to.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8304)
Azure Data and AI Architect Handbook by Olivier Mertens & Breght Van Baelen(6762)
Building Statistical Models in Python by Huy Hoang Nguyen & Paul N Adams & Stuart J Miller(6737)
Serverless Machine Learning with Amazon Redshift ML by Debu Panda & Phil Bates & Bhanu Pittampally & Sumeet Joshi(6622)
Data Wrangling on AWS by Navnit Shukla | Sankar M | Sam Palani(6407)
Driving Data Quality with Data Contracts by Andrew Jones(6347)
Machine Learning Model Serving Patterns and Best Practices by Md Johirul Islam(6112)
Learning SQL by Alan Beaulieu(5999)
Weapons of Math Destruction by Cathy O'Neil(5785)
Big Data Analysis with Python by Ivan Marin(5372)
Data Engineering with dbt by Roberto Zagni(4376)
Solidity Programming Essentials by Ritesh Modi(4024)
Time Series Analysis with Python Cookbook by Tarek A. Atwan(3885)
Pandas Cookbook by Theodore Petrou(3589)
Blockchain Basics by Daniel Drescher(3301)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2911)
Feature Store for Machine Learning by Jayanth Kumar M J(2816)
Learn T-SQL Querying by Pam Lahoud & Pedro Lopes(2799)
Mastering Python for Finance by Unknown(2745)
